04:00
2026-06-06
arxiv.org
artificial-intelligence
Agents' Last Exam
Researchers introduced Agents' Last Exam (ALE), a new benchmark designed to evaluate AI agents on long-horizon, economically valuable, real-world tasks with verifiable outcomes. Developed with over 25โฆ